In order to successfully annotate the Arabic speech con- tent found inopen-domain media broadcasts, it is essential to be able to process a diverseset of Arabic dialects. For the 2017 Multi-Genre Broadcast challenge (MGB-3)there were two possible tasks: Arabic speech recognition, and Arabic DialectIdentification (ADI). In this paper, we describe our efforts to create an ADIsystem for the MGB-3 challenge, with the goal of distinguishing amongst fourmajor Arabic dialects, as well as Modern Standard Arabic. Our research fo-cused on dialect variability and domain mismatches between the training andtest domain. In order to achieve a robust ADI system, we explored both Siameseneural network models to learn similarity and dissimilarities among Arabicdialects, as well as i-vector post-processing to adapt domain mismatches. BothAcoustic and linguistic features were used for the final MGB-3 submissions,with the best primary system achieving 75% accuracy on the official 10hr testset.
展开▼